correspondence analysis
- North America > United States > California > Orange County > Irvine (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Montana (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- North America > United States > California > Orange County > Irvine (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- North America > United States > California > Orange County > Irvine (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Montana (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- North America > United States > California > Orange County > Irvine (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
A comparison of correspondence analysis with PMI-based word embedding methods
Qi, Qianqian, Hessen, David J., van der Heijden, Peter G. M.
Popular word embedding methods such as GloVe and Word2Vec are related to the factorization of the pointwise mutual information (PMI) matrix. In this paper, we link correspondence analysis (CA) to the factorization of the PMI matrix. CA is a dimensionality reduction method that uses singular value decomposition (SVD), and we show that CA is mathematically close to the weighted factorization of the PMI matrix. In addition, we present variants of CA that turn out to be successful in the factorization of the word-context matrix, i.e. CA applied to a matrix where the entries undergo a square-root transformation (ROOT-CA) and a root-root transformation (ROOTROOT-CA). An empirical comparison among CA- and PMI-based methods shows that overall results of ROOT-CA and ROOTROOT-CA are slightly better than those of the PMI-based methods.
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Netherlands (0.04)
- Asia > Singapore (0.04)
- (8 more...)
Visualization of Extremely Sparse Contingency Table by Taxicab Correspondence Analysis: A Case Study of Textual Data
We present an overview of taxicab correspondence analysis, a robust variant of correspondence analysis, for visualization of extremely sparse ontingency tables. In particular we visualize an extremely sparse textual data set of size 590 by 8265 concerning fragments of 8 sacred books recently introduced by Sah and Fokou\'e (2019) and studied quite in detail by (12 + 1) dimension reduction methods (t-SNE, UMAP, PHATE,...) by Ma, Sun and Zou (2022).
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- Europe > France (0.04)
- Asia > India (0.04)
- Asia > China > Tibet Autonomous Region (0.04)
Combining Classifiers Using Correspondence Analysis
Several effective methods for improving the performance of a sin(cid:173) gle learning algorithm have been developed recently. The general approach is to create a set of learned models by repeatedly apply(cid:173) ing the algorithm to different versions of the training data, and then combine the learned models' predictions according to a pre(cid:173) scribed voting scheme. Little work has been done in combining the predictions of a collection of models generated by many learning algorithms having different representation and/or search strategies. This paper describes a method which uses the strategies of stack(cid:173) ing and correspondence analysis to model the relationship between the learning examples and the way in which they are classified by a collection of learned models. A nearest neighbor method is then applied within the resulting representation to classify previously unseen examples.
Multilingual textual data: an approach through multiple factor analysis
Blechin, Kostov, Ramón, Alvarez-Esteban, Mónica, Bécue-Bertaut, François, Husson
This paper focuses on the analysis of open-ended questions answered in different languages. Closed-ended questions, called contextual variables, are asked to all respondents in order to understand the relationships between the free and the closed responses among the different samples since the latter assumably affect the word choices. We have developed "Multiple Factor Analysis on Generalized Aggregated Lexical Tables" (MFA-GALT) to jointly study the open-ended responses in different languages through the relationships between the choice of words and the variables that drive this choice. MFA-GALT studies if variability among words is structured in the same way by variability among variables, and inversely, from one sample to another. An application on an international satisfaction survey shows the easy-to-interpret results that are proposed.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Spain > Castile and León > León Province > León (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- Research Report (0.64)
- Questionnaire & Opinion Survey (0.48)
Correspondence Analysis Using Neural Networks
Hsu, Hsiang, Salamatian, Salman, Calmon, Flavio P.
Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable. We show that this optimization problem, in turn, can be efficiently approximated by neural networks. The resulting formulation, called the correspondence analysis neural network (CA-NN), enables CA to be performed at an unprecedented scale. We validate the CA-NN on synthetic data, and demonstrate how it can be used to perform CA on a variety of datasets, including food recipes, wine compositions, and images. Our results outperform traditional methods used in CA, indicating that CA-NN can serve as a new, scalable tool for interpretability and visualization of complex dependencies between random variables.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- (10 more...)
Correspondence Analysis of Government Expenditure Patterns
Hsu, Hsiang, Calmon, Flavio P., Filho, José Cândido Silveira Santos, Calmon, Andre P., Salamatian, Salman
We analyze expenditure patterns of discretionary funds by Brazilian congress members. This analysis is based on a large dataset containing over $7$ million expenses made publicly available by the Brazilian government. This dataset has, up to now, remained widely untouched by machine learning methods. Our main contributions are two-fold: (i) we provide a novel dataset benchmark for machine learning-based efforts for government transparency to the broader research community, and (ii) introduce a neural network-based approach for analyzing and visualizing outlying expense patterns. Our hope is that the approach presented here can inspire new machine learning methodologies for government transparency applicable to other developing nations.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Mexico (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- (10 more...)